Toolformer: Language Models Can Teach Themselves to Use Tools

We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. (Abstract)

self-supervised

We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar.

Figure 1

LLM自身がGoogle検索や計算機を使って正しい答えにたどり着ける

2 Approach

Figure 2: Key steps in our approach

1. Sample API Calls

Figure 3

[qa]など特殊なトークンを使う

Appendix A.2

2. Execute API Calls

3. Filter API Calls

LM datasetsをAPI callを含むLM datasetsに変える

-> ファインチューニング

3にTools

Question Answering

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Calculator

Wikipedia Search

Machine Translation System

Calendar

Table 1

4 Experiments

we use a subset of CCNet as our language modeling dataset C and GPT-J as our language model M (4.1)

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

GPT-J-6B

We evaluate our models on the SQuAD, Google-RE and T-REx subsets of the LAMA benchmark (4.2.1)

LAMA (Language Models as Knowledge Bases?)

Table 3

Toolformerが最良。GPT-3を超えている

Table 8: Toolformerでperplexityが悪化していないことも確認 (4.3)

コミュニティ実装

toolformer-pytorch

toolformer